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(54) Parallel searching technique 

(57) A parallel query manager accepts a list of file 
extents to be searched and produces a number of 
search lists, one for each disk to be searched. The query 
manager first uses a mapper to find out how the data- 
base spaces are stored on disk. It then matches the 



search extent list with the mapping information to deter- 
mine which parts of which disks are to be searched. It 
then initiates several searches in parallel so that all the 
affected disks can be kept busy at the same time. The 
query manager then checks for return data on each 
stream, and merges the results. 
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Description 

Background to the Invention 

5 This invention relates to a parallel searching technique. The invention is particularly, although not exclusively, 

concerned with a technique for parallel searching of a relational database. 

In a relational database management system (RDBMS), the database storage space comprises one or more files, 
which are typically stored on a number of disks. When searching a relational database, it is desirable for the RDBMS 
to be able to read each of these files as fast as possible. Various solutions to this have been proposed. 

10 in a first proposed solution, the file is split into fragments and these fragments are stored in separate data spaces. 

When the file is to be searched, a separate thread is initiated for each fragment and process scheduling, thread sched- 
uling and multiple processor hardware being used to keep all threads busy. The threads may or may not interfere with 
one another, since the RDBMS does not know where the data is stored. Interference happens when two threads access 
the same disk at the same time, which causes frequent head movements and reduces the data transfer rate. 

is in another proposed solution, the file is divided into partitions and the partitions are sent to different processes that 

can be executed independently. The processes may or may not interfere with one another, since they do not know the 
physical placement of the data. 

In yet another proposed solution, a low level process monitors the input/output activity of users and, when it appears 
that sequential access is being used, the process initiates large multi-block reads in anticipation of the application 
20 requirement. 

There are also "massively parallel" solutions where each set of disks has its own processor so that a search can 
be split between multiple processors. 

The object of the present invention is to provide an improved technique for parallel searching. 

25 Summary of the Invention 

According to the invention there is provided a data processing system comprising a plurality of data storage units 
and an application which generates a search request specifying a list of data areas to be searched within one or more 
files, characterised by: 

30 

(a) means for creating a mapping table, indicating the way in which the files are mapped on to the data storage un its; 

(b) means for utilising said list of data areas and said mapping table to create a plurality of search lists, one for 
each of the data storage units, each search list identifying the data areas or parts thereof that are mapped on to 

35 a respective one of the data storage units; and 



(c) means for initiating a plurality of searches in parallel on respective data storage units, using the search lists. 
Brief Description of 1he Drawings 

Figure 1 is a block diagram of a computer system using a parallel searching technique in accordance with the 
invention. 

Figure 2 is a diagram showing a tree structure representing the mapping of a file on to a set of physical disks. 
Figure 3 is a flow chart showing the operation of a parallel search manager 

Figure 4 is a flow chart showing the operation of a routine forming part of the parallel search manager. 
Figure 5 is a block diagram of an alternative computer system using a parallel searching technique in accordance 
with the invention. 



Description of an Embodiment of the invention 

One parallel searching technique in accordance with the invention will now be described by way of example with 
reference to the accompanying drawings. 

Figure 1 shows a computer system comprising a host processor 10, having an operating system 11 and application 
software 1 2. The host processor may, for example, be an ICL DRS 6000 processor, and the operating system may be 
the ICL NX operating system. 

The host processor has a number of SCSI channels 13 (two in this example), each of which connects to a number 
of disk drives 14. Each SCSI channel also has a search accelerator unit 15 connected to it, for performing searches 
on the data stored in the disks. The search accelerator units 15 are, in this example, ICL SCAFS units, supplied by 
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International Computers Limited. 

The applications software includes a relational database management system (RDBMS) 16, a query manager 17, 
and a parallel search manager 18. The RDBMS and the query manager are conventional, and so will not be described 
in detail. The parallel search manager will be described in more detail later. 

The operating system includes a filestore manager and a logical volume manager 1 9, a SCAFS driver 20, a sched- 
uler 21 , and a filestore mapper 22. The filestore and logical volume managers 19 may be conventional and so will not 
be described in detail. The SCAFS driver 20 is a driver supplied by International Computers Limited, for interfacing 
between the operating system and the SCAFS units. The filestore mapper 22 will be described in more detail below. 

The RDBMS 16 manages a relational database, stored on the disks. The database storage space consists of a 
number of files. Each file comprises of a number of data areas, referred to as extents, each of which consists of a set 
of contiguously addressed blocks. 

The functbn of the filestore mapper 22 is to generate a mapping table, representing the way a specified file is 
mapped on to the physical disks. When called, the mapper 22 interfaces with the filestore and logical volume managers 
1 9 to retrieve mapping information about the file. It may also lock the file against relocation actions by other users. The 
mapper 22 returns a mapping table, representing the mapping of the file as a tree structure, along with an indication 
of the number of entries in the table. 

The mapping table contains a sequence of entries, each representing a component in a tree structure. Each entry 
has the following fields: 



Level number 
Number of components 
Type 

Type-dependent data. 

Level number indicates the level of the component within the tree structure. Level 0 is the root of the tree. 
Number of components indicates how many components (if any) are attached to this component in the next level 
of the tree structure. 

Type indicates the type of component. The following types are defined: 



file system : 
raw : 
concat : 
striped : 



mirrored : 



a file stored in a file system. 

a logical volume stored as all or part of a physical disk. 

a logical volume which is the concatenation of one or more component logical volumes 
a logical volume which is striped, with a fixed stripe size, over a set of component logical volumes 
of the same size. Striping assigns logically consecutive segments of a volume to a fixed set of 
component volumes on a round robin basis - a, b, c, a, b, a... 

a logical volume which is mirrored, with the information being replicated over two or more logical 
volumes of the same size. 



Type-dependent data is specified as follows: 



file system : name 

raw : disk name, offset and length 

concat : logical volume name, size 

striped : stripe size 

mirrored : size. 



The disk name indicates which channel the disk is attached to, and its SCSI address on that channel. 
For example, the mapping table for a file might contain the following entries: 



Level number 


Number of components 


Type 


Type-dependent data 


0 


3 


concat 


volume A, size=8Mb 


1 


0 


raw 


disk 1, offset=0, length=2.5Mb 


1 


0 


raw 


disk 2, offset=48k, length=3Mb 


1 


0 


raw 


disk 3, offset=2Mb, length=2.5 Mb 



This mapping table represents a file which is mapped to a logical volume A, which in turn is mapped to three 
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. °" a s * pa,a,e phys,cal disk - Thit ™ ppin9 oan b8 rep — *•<**••* " - <•» — . 

A file reference (file descriptor or full file name). 

- The offset of the start of the extent relative to the start of the file. 

- The length of the data area to be searched in this extent. 

two Lms RDBMS thSn S6ndS 3 bU ' k lnPUt reqU6St ,0 Para " el S6arCh mana 9 er The bu| k -put request comprises 

the number of extents to be searched 
a pointer to the dataspace extent list. 

requ2 Tom TenZT ^ « ™-ger ,8, when it receives a hulk input 

in , h ff S t!? T ? Para " el S6arCh mana9Sr firS ' SCanS thS dalas P^ extent list to identify which files are referenced 
■n th,s l.st. Th.s step generates a file list, comprising the following information: relerenced 

the number of files 
a set of pointers to the files. 

(Sup 32) For each iile in the life list, In. parallel search manager makes . request to the mapper 22 via a s.sl.n, 

nC dT^ssr ■ ,h * ™ pp - '• ,ums • — - *«k *~. - '~ ZZZ 

ou, n. T ? °" *" a " re '" l> Se ' 8Cled dlSlt The inner «"»*» 8 "cLle Sel^sr Zin. Th°s 

S Th. eTnf " r "'!: ', den%ing ,hS eX ' enB ' or P ar ^ s °* extents, ol the selected lile thai nSp „S2 

z, stsrrcs sr ,,ha ' * - » ** **» - ~~ - - sss 

S S "^Tf II 563 ^! 1 USt " r ° Utine iS described in de tail below, with reference to Figure 4 
SCAFS driS ^ .h S9arCh NStS h3Ve bSen ° rea,ed ' each list ma V be independently passed to the 

The sr AF?n 3 , T " 10 initi3te 3 S6arCh ,hr ° U9h ,he s P ecified ^ °n the specified disk 

SCA?S ^^r^l^^ 0 "- ? !°H diSk ° f,SetS (diSK 3ddreSSeS) and P asses the to ^respective 
Earh ?r rS T i f marches are initiated in parallel, so that all the affected disks are busy at the same time 
Each SC AFS una performs the requested searches and returns a stream of selected rows or records t ^hos 
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(Step 45) The routine then checks whether the byte count is greater than the offset value of the currently selected 
extent, i.e. whether the extent maps (at least partially) into the fragment. If so, the routine proceeds to Step 46; otherwise, 
it returns to Step 43 above, to search for the next fragment. 

(Step 46) The routine checks whether the fragment is on the target disk (i.e. the currently selected disk). If so, the 
5 routine proceeds to Step 47; otherwise it proceeds to Step 50. 

(Step 47) If the fragment is on the target disk, the routine identifies the overlap between the extent and the fragment. 
It then creates an entry in the output search list, this entry including the file name, offset and length of the overlap area. 
(As an added refinement the disk offset can also be included in the entry, and used to order the disk search extents 
and minimise disk head movement.) 
10 (Step 48) The routine then checks whether the currently selected extent has been exhausted, i.e. whether the byte 

count is greater than the sum of the extent's offset and length. If so, the routine proceeds to Step 49. If, on the other 
hand, the extent has not been exhausted (i.e. the extent continues into the next fragment), the routine returns to Step 
43 above, so as to search for the next fragment. 

(Step 49) If the currently selected extent has been exhausted, the next extent in the file extent list is now selected, 
is and the routine returns to Step 45 above. 

(Step 50) It the currently selected fragment is not on the target disk, the procedure checks whether the byte count 
is greater than the sum of the extent's offset and length, i.e. whether the currently selected extent terminates within 
the fragment. It so, the routine proceeds to Step 5V, otherwise, returns to Step 43 above, so as to search for the next 
fragment. 

20 (Step 51 ) The routine selects the next extent in the file extent list, and returns to step 50. 

The above loop, comprising Steps 43-51, is repeated until it is found (at Step 49 or 51) that there are no more 
extents in the file extent list to be processed. If the end of the mapping table is reached before this, (i.e. there are no 
more fragments to process), an error has occurred. 

For mirrored volumes the search for. the fragment follows one mirror only. The actual decision on the mirror to be 
25 searched is taken by the operating system. 

Summary 

In summary, the parallel query manager accepts a list of extents to be searched and organises them into an efficient 
30 sequence for searching. This enables an extremely high data search rate to be achieved. 
This solution has the following benefits: 

It is architecturally simpler than multi-process or multi-threading solutions. 

- It takes account of physical data placement so can be optimised to minimise disk head movement and maximise 

35 data input rates. 

The search activity can be scheduled to make best use of system resources. 

- It does nol rely on availability of multiple processors and is quite effective even when only one processor is available. 

Some possible modifications 

40 

It will be appreciated that many modifications may be made to the system described above without departing from 
the scope of the present invention. 

For example instead of using search processors, the searching may be done by the RDBMS itself. In this case, 
the parallel search manager would be used in the same way as described above, to create lists of search areas to lor 
45 each disk A bulk input manager, resident in the host processor, would then use these lists to drrve a series of asyn- 
chronous block reads through the disk driver, so as to read the required data into the host, for searching by the RDBMS. 
This possibility is illustrated in Figure 5. 

50 Claims 

1 A data processing system comprising a plurality of data storage units (1 4) and an application (1 7) which generates 
a search request specifying a list of data areas to be searched within one or more files, characterised by: 

ss (a) means (22) for creating a mapping table, indicating the way in which the files are mapped on to the data 

storage units; 

(b) means (18) for utilising said list of data areas and said mapping table to create a plurality of search lists. 
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one for each of the data storage units, each search list identifying the data areas, or parts thereof, that are 
mapped on to a respective one of the data storage units; and 

(c) means (18) for initiating a plurality of searches in parallel on respective data storage units usinq the search 
lists. 

A data processing system according to Claim 1 wherein said means for utilising said list of data areas and said 
mapping table to create a plurality of search lists comprises: 

(a) means for performing an outer loop, in respect of each of said data storage units, and an inner loop in 
respect of each of said files; 

(b) means within said inner loop for finding the data areas or parts thereof within a particular target file that 
map on to a particular target disk. 

A data processing system according to Claim 1 or 2 further comprising means for merging the results of said 
plurality of searches and for returning the merged results to said application. 

4. A data processing system according to any preceding claim" wherein said application is a relational database 
20 management system. 

5. A data processing system according to any preceding claim wherein said data storage units are magnetic disk 
storage units. 
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6. A data processing system according to any preceding claim wherein said means for creating a mapping table is 
resident within the operating system of the data processing system. 

7. A data processing system according to any preceding claim further including a plurality of search accelerator units 
for performing said searches in parallel. 

8. A parallel search manager comprising; 

(a) means for receiving a search request specifying a list of data areas to be searched within one or more files; 

(b) means for creating a mapping table, indicating the way in which the files are mapped on to a plurality of 
data storage units; 

(c) means for utilising said list of data areas and said mapping table to create a plurality of search lists one 
for each of the data storage units, each search list identifying the data areas or parts thereof that are mapped 
on to a respective one of the data storage units; and 

(d) means for initiating a plurality of searches in parallel on respective data storage units, using the search lists. 
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